import pandas as pd
import numpy as np
from lets_plot import *
from palmerpenguins import load_penguins
LetsPlot.setup_html()
= load_penguins() data
Data Visualization with Lets Plot
Introduction
“The simple graph has brought more information to the data analyst’s mind than any other device.” — John Tukey
This chapter introudces data visualization with the Lets Plot library. We briefly discus the grammar of graphics which is a useful paradigm for understanding the fundamentals of building graphs. Then we introudce the basics of Lets Plot and provide resources for further development.
Prerequisites
For a slightly more interesting introduction, let’s look at the penguins dataset built into the lets_plot data library. This dataset contains measurements on species of penguins.
::: {.callout-caution, collapse=“false”} #### Caution R has a package that uses ggplot
and there is also another Python library plotnine
that uses the same ggplot
syntax. If you don’t specify lets_plot
when using ChatGPT, you may be led down the wrong code path. It is recommended you only use the ChatGPT that was designed for this course. :::
First, load the data by calling load_penguins
from the palmerpenguins
library as follows:
To explore the relationship between bill length and bill depth, we can start with a basic scatterplot.
# sometime you have to run this cell twice to get the plot to show
(
ggplot(data) + geom_point(aes(x='bill_length_mm', y='bill_depth_mm'), color='blue')
+ ggtitle("The relationship between bill length and bill depth")
)
When all species are lumped together in this scatterplot it doesn’t look like there is much of a relationship between the bill length and bill depth. We can improve our visualization by coloring the points based on species.
(
ggplot(data) + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species'))
+ ggtitle("The relationship between bill length and bill depth by species")
)
We can also easily change the shape and size of the points in a scatterplot.
(
ggplot(data) + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
+ ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
)
Now that’s going overboard! But hopefully the basic syntax for using lets_plot
makes sense.
Making Lets-Plot More Presentable
In this section, we look at how to customize charts to be more informative and presentable. For example, column names in a dataset are rarely a good idea to present to someone not as intimately familiar with the data as you. We may also wish to highlight certain points, or draw attention to areas on a graph.
To begin, let’s return to a reasonable visualization of the penguins data. We will start by naming our ggplot
chart object “plot” and changing the X and Y axis labels.
The way is to include the labs
function in the plot inputs. The arguments in this function are the axis names and their desired labels.
= (
plot
ggplot(data)+ geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
+ ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
+ labs(x='Bill Length (mm)', y='Bill Depth (mm)', color='Species', shape='Species', size='Flipper Length (mm)')
)
plot
The next method makes the same adjustments but modifies the chart “post hoc”. Start with a very basic chart, “plot” and use the +
operator to add layers and modify titles.
= (
plot
ggplot(data)+ geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
+ ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
)
= plot + labs(x='Bill Length (mm)', y='Bill Depth (mm)', color='Species', shape='Species', size='Flipper Length (mm)')
plot plot
The 2 approaches above have the same outcome, but the latter example introduces a flexible lets_plot
paradigm that can be extended to other chart additions and modifications. For example, if we want to add a reference line to “plot”, we can use the geom_hline()
function.
= plot + geom_hline(yintercept=7, linetype='dotted', color='black')
plot plot
We can add several different shapes, including circles, lines or rectanges using the .add_shape()
method. This method specifies what type of shape to add to the graph given a set of coordinates (x0, x1, y0, y1). “.add_shape()” can be used to draw reference lines as well, but still requires a all 4 coordinates.
Here is a simple example first:
# Adds a horizontal line
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
# Sample data with correct data types
= {'x': [1.0, 2.0, 3.0, 4.0, 5.0], 'y': [5.0, 4.0, 3.0, 2.0, 1.0]}
data = pd.DataFrame(data)
df
# Ensure data types are correct
'x'] = df['x'].astype(float)
df['y'] = df['y'].astype(float)
df[
# Create scatter plot
= ggplot(df, aes('x', 'y')) + geom_point()
plot
# Add a rectangle shape using direct numeric values
+= geom_rect(xmin=2.0, xmax=4.0, ymin=1.0, ymax=3.0, fill='red', alpha=0.3)
plot
# Display the plot
plot.show()
Now here is it applied to our penguin data:
# Load the penguins dataset
= load_penguins()
penguins
# Create scatter plot
= (
plot
ggplot(penguins)+ geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
+ ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
)
# Add labels
= plot + labs(x='Bill Length (mm)', y='Bill Depth (mm)', color='Species', shape='Species', size='Flipper Length (mm)')
plot
# Add a rectangle shape using direct numeric values
+= geom_rect(xmin=33.0, xmax=45.0, ymin=16.0, ymax=22.0, fill='red', alpha=0.3)
plot
# Display the plot
plot
The code above introduces the geom_rect
features, it is not always necessary for every situation. But hopefully this gives a flavor of what can be done.
Notice also, that the geom_
methods actually update plot
. No need to overwrite the original object or create a new chart object for each modification.
Other geom_*
itions
There are several other useful “post hoc” graph modifications that we only mention in this chapter. For further exploration, see Lets-Plot Documentation.
Use geom_text()
to add text annotations at specific locations. Update axes by using scale_x_continuous()
or scale_y_continuous()
which allows you to modify gridlines and add units of measure like “%” or “$”. geom_vline()
and geom_hline()
allow you to add vertical or horizontal reference lines. The possibilities are almost endless!
Resources
This page has introduced the basics of lets_plot
, but we have only looked at a scatterplot. For links to further documentation and a whole gallery of lets_plot
possibilities, see Lets-Plot Documentation.